Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update Simpleaf modules, subworkflow #424

Open
wants to merge 40 commits into
base: dev
Choose a base branch
from
Open

Conversation

DongzeHE
Copy link
Member

@DongzeHE DongzeHE commented Jan 22, 2025

Reopen #361 after updating simpleaf central modules. See this PR. I have tested using a 10x 500 dataset. Once the modules' PR is merged, we can start merging this PR

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/scrnaseq branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@DongzeHE DongzeHE requested a review from fmalmeida January 22, 2025 17:41
@nf-core-bot
Copy link
Member

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.1.1.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

@DongzeHE DongzeHE requested a review from grst January 22, 2025 17:43
Copy link
Member

@grst grst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor things

core.1739377 Outdated Show resolved Hide resolved
modules/local/alevinqc.nf Show resolved Hide resolved
modules/local/mtx_to_h5ad.nf Outdated Show resolved Hide resolved
nextflow.config Show resolved Hide resolved
subworkflows/local/simpleaf.nf Show resolved Hide resolved
workflows/scrnaseq.nf Outdated Show resolved Hide resolved
subworkflows/local/simpleaf.nf Show resolved Hide resolved
@an-altosian
Copy link
Contributor

Hi @grst ,

I think I am pretty happy with the code now. Interestingly, although I did not touch the code for other aligners, all CI tests except those for simpleaf failed.

I tested my changes locally and everything worked. We can discuss linting and the output structure now.

The current output layout is as the following. The biggest change is I removed the alevinqc folder and exported the alevinqc report to the directory of each sample.

`-- simpleaf
    |-- Sample_X
    |   |-- simpleaf_qc_report_Sample_X.html
    |   |-- simpleaf_quant
    |   |   |-- af_map
    |   |   |   |-- map.rad
    |   |   |   |-- map_info.json
    |   |   |   `-- unmapped_bc_count.bin
    |   |   `-- af_quant
    |   |       |-- alevin
    |   |       |   |-- quants.h5ad
    |   |       |   |-- quants_mat.mtx
    |   |       |   |-- quants_mat_cols.txt
    |   |       |   `-- quants_mat_rows.txt
    |   |       |-- collate.json
    |   |       |-- featureDump.txt
    |   |       |-- generate_permit_list.json
    |   |       |-- map.collated.rad
    |   |       |-- permit_freq.bin
    |   |       |-- permit_map.bin
    |   |       |-- quant.json
    |   |       `-- unmapped_bc_count_collated.bin
    |   `-- versions.yml
    |-- Sample_Y
    |   |-- simpleaf_qc_report_Sample_Y.html
    |   |-- simpleaf_quant
    |   |   |-- af_map
    |   |   |   |-- map.rad
    |   |   |   |-- map_info.json
    |   |   |   `-- unmapped_bc_count.bin
    |   |   `-- af_quant
    |   |       |-- alevin
    |   |       |   |-- quants.h5ad
    |   |       |   |-- quants_mat.mtx
    |   |       |   |-- quants_mat_cols.txt
    |   |       |   `-- quants_mat_rows.txt
    |   |       |-- collate.json
    |   |       |-- featureDump.txt
    |   |       |-- generate_permit_list.json
    |   |       |-- map.collated.rad
    |   |       |-- permit_freq.bin
    |   |       |-- permit_map.bin
    |   |       |-- quant.json
    |   |       `-- unmapped_bc_count_collated.bin
    |   `-- versions.yml
    `-- mtx_conversions
        |-- Sample_X
        |   |-- Sample_X_raw_matrix.h5ad
        |   |-- Sample_X_raw_matrix.sce.rds
        |   `-- Sample_X_raw_matrix.seurat.rds
        |-- Sample_Y
        |   |-- Sample_Y_raw_matrix.h5ad
        |   |-- Sample_Y_raw_matrix.sce.rds
        |   `-- Sample_Y_raw_matrix.seurat.rds
        |-- combined_raw_matrix.h5ad
        |-- combined_raw_matrix.sce.rds
        `-- combined_raw_matrix.seurat.rds

Please let me know what you think! We are getting close!

@DongzeHE DongzeHE marked this pull request as ready for review February 8, 2025 17:38
Copy link
Member

@grst grst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few final minor things. Happy to merge once those are addressed!

Also, please update the CHANGELOG :)

conf/modules.config Outdated Show resolved Hide resolved
conf/modules.config Outdated Show resolved Hide resolved
Comment on lines 62 to 67
if "gene_symbol" in adata.var.columns:
adata.var['gene_ids'] = adata.var['gene_symbol']
else:
adata.var['gene_ids'] = adata.var['gene_id']

adata.var['gene_versions'] = adata.var['gene_ids']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how the anndata generated by simpleaf looks like, so just commenting to be sure we are on the same page. For consistency across all aligners, in scrnaseq, we expect

  • adata.var_names are always (ensembl) gene ids without version suffix.
  • adata.var["gene_symbol"] contains human-readable gene symbols/names. They don't need to be unique
  • adata.var["gene_versions"] may contain ENSG IDs including the gene version.

adata.var can contain arbitrary other columns, but I'd avoid redundancies. E.g. we don't need gene_id and gene_ids and the same in the index, just get rid of the redundant columns in that case.

modules/local/alevinqc.nf Show resolved Hide resolved
@an-altosian
Copy link
Contributor

I think I addressed all your comments. As this is my first PR to scrnaseq and I made many major changes, before we merge it, can we invite more reviewers? It will be great if other maintainers can go through the changes, especially the document part.

Thanks,
Dongze

@grst
Copy link
Member

grst commented Feb 11, 2025

Thanks for the updates!
Sure, let's first try @nictru!

@grst grst requested a review from nictru February 11, 2025 07:17
@fmalmeida
Copy link
Contributor

I cannot promise, but I can try to find some time to review.
And, many thanks for the PR, @DongzeHE

Copy link
Contributor

@fmalmeida fmalmeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi there,
Thanks for the work on it.

I have added a few sincere questions and some comments for changes (if y'all agree) :)

conf/modules.config Outdated Show resolved Hide resolved
modules/local/alevinqc.nf Show resolved Hide resolved
nextflow.config Show resolved Hide resolved
subworkflows/local/simpleaf.nf Show resolved Hide resolved
subworkflows/local/simpleaf.nf Show resolved Hide resolved
{assert new File( "${outputDir}/results_simpleaf/simpleaf/Sample_X/simpleaf_quant/af_quant/alevin/quants_mat.mtx" ).exists()},
{assert new File( "${outputDir}/results_simpleaf/simpleaf/Sample_X/simpleaf_quant/af_quant/alevin/quants.h5ad" ).exists()},
{assert new File( "${outputDir}/results_simpleaf/simpleaf/mtx_conversions/Sample_Y/Sample_Y_raw_matrix.h5ad" ).exists()},
{assert new File( "${outputDir}/results_simpleaf/simpleaf/Sample_Y/simpleaf_quant/af_quant/alevin/quants_mat_cols.txt" ).exists()},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is any of these alevin files now possible to have as snaps with the new version?
I remember before they could vary the sorting but not sure about the newest version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For mtx and its column and row names, unfortunately they can still vary because of parallelization. For the h5ad file, what I can do is sorting it in the mtx_to_h5ad module to make the order fixed. Do you think it is necessary?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Generally, I do not think is necessary. But, if it would be possible to have it being part of the snaps, it would surely add robustness.

@grst , do you think this should be here or is the PR already big enough and better to have another?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean having a snapshot is obviously better than not having one, but I won't insist on it.

workflows/scrnaseq.nf Outdated Show resolved Hide resolved
workflows/scrnaseq.nf Show resolved Hide resolved
workflows/scrnaseq.nf Outdated Show resolved Hide resolved
Copy link
Contributor

@nictru nictru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job in general - just a few remarks :)

subworkflows/local/simpleaf.nf Outdated Show resolved Hide resolved
subworkflows/local/simpleaf.nf Outdated Show resolved Hide resolved
nextflow.config Show resolved Hide resolved
@an-altosian
Copy link
Contributor

OK I think I have address all comments but two:

  1. Exposing other cell filtering strategies: thread
  2. testing: thread

For 1, I can add two parameters and implement the logic, not a big deal.

For 2, I realized that I did not change the test file name so simpleaf was not tested. I ran tests locally, everything worked well, but a weird bug jumped out in GitHub Actions:

Local test log

nf-test test tests/main_pipeline_simpleaf.nf.test --ci

? nf-test 0.9.2
https://www.nf-test.com
(c) 2021 - 2024 Lukas Forer and Sebastian Schoenherr

nf-test runs in CI mode.

Test Workflow main.nf

  Test [bd8e613a] 'test-dataset_simpleaf_aligner' PASSED (124.819s)


SUCCESS: Executed 1 tests in 125.814s

GitHub Actions workflow error:
https://github.com/nf-core/scrnaseq/actions/runs/13277650102/job/37070051243?pr=424#step:9:199

I could not reproduce this error locally. Any suggestions how I can address it? Could you download the artifact of the failed job so that I can jump into it?

@an-altosian
Copy link
Contributor

an-altosian commented Feb 12, 2025

So it turns out that the error comes from this line in simpleaf, caused by the the internal mtx to h5ad conversion (this line) where pola-rs encountered an empty featureDump.txt (this line).

It is strange because this file, generated by alevin-fry in this line, should at least have its header. The logic here is, Simpleaf first asks alevin-fry to generate this file, then read this file and add it into the h5ad output. So, this file should definitely be there.

In some runs, this file was there but I got different md5sums of sorted h5ad files. This also doesn't make sense because, although the columns and rows can swap, the counts of a specific gene in a specific cell should be consistent.

@rob-p - Do you have any idea what happened here?

@fmalmeida
Copy link
Contributor

Super. I do not think I have any other comment, besides the one in the nf-tests which would be good to have but not necessary.
So I am now approving.
Thanks for the great work here 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants